Due to rapid data growth, statistical analysis of massive datasets often hasto be carried out in a distributed fashion, either because several datasetsstored in separate physical locations are all relevant to a given problem, orsimply to achieve faster (parallel) computation through a divide-and-conquerscheme. In both cases, the challenge is to obtain valid inference that does notrequire processing all data at a single central computing node. We show thatfor a very widely used class of spatial low-rank models, which can be writtenas a linear combination of spatial basis functions plus a fine-scale-variationcomponent, parallel spatial inference and prediction for massive distributeddata can be carried out exactly, meaning that the results are the same as for atraditional, non-distributed analysis. The communication cost of ourdistributed algorithms does not depend on the number of data points. Afterextending our results to the spatio-temporal case, we illustrate ourmethodology by carrying out distributed spatio-temporal particle filteringinference on total precipitable water measured by three different satellitesensor systems.
展开▼